Comprehensive study on the Python-based regression machine learning models for prediction of uniaxial compressive strength using multiple parameters in Charnockite rocks

The strength of rock under uniaxial compression, commonly known as Uniaxial Compressive Strength (UCS), plays a crucial role in various geomechanical applications such as designing foundations, mining projects, slopes in rocks, tunnel construction, and rock characterization. However, sampling and preparation can become challenging in some rocks, making it difficult to determine the UCS of the rocks directly. Therefore, indirect approaches are widely used for estimating UCS. This study presents two Machine Learning Models, Simple Linear Regression and Step-wise Regression, implemented in Python to calculate the UCS of Charnockite rocks. The models consider Ultrasonic Pulse Velocity (UPV), Schmidt Hammer Rebound Number (N), Brazilian Tensile Strength (BTS), and Point Load Index (PLI) as factors for forecasting the UCS of Charnockite samples. Three regression metrics, including Coefficient of Regression (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), were used to evaluate and compare the performance of the models. The results indicate a high predictive capability of both models. Notably, the Step-wise model achieved a testing R2 of 0.99 and a training R2 of 0.988 for predicting Charnockite strength, making it the most accurate model. The analysis of the influential factors indicates that UPV plays a significant role in predicting the UCS of Charnockite.

Several researchers have proposed empirical equations for predicting different rock types' unconfined compressive strength (UCS) using physio-mechanical parameters and non-destructive tests.For instance, Fakir et al. 1 developed equations for the granitoid rocks of South Africa, while Habib et al. 2 established excellent correlations for sedimentary rocks in Algeria.Jalali et al. 3 prepared linear regression equations for igneous and metamorphic rocks from different locations in Iran using block punch index, cylindrical punch index, and UCS tests.Similarly, Kurtulus et al. 4 , Son and Kim 5 , Aldeeky and Hattamleh 6 , Arman and Paramban 7 , Bolla and Paronuzzi 8 , and Chawre 9 found promising results in relating UCS to NDT tests such as Schmidt hammer, Ultrasonic pulse velocity (UPV), and total sound-signal energy for different rock types.Furthermore, Mishra et al. 10 conducted mechanical, physical, and petrological studies on rock types in India and classified them accordingly.Aladejare et al. 11 compiled a dataset of experimental correlations between uniaxial compressive strength (UCS) and various other rock properties based on published literature, which can help in selecting a suitable regression equation for estimating the UCS of a rock site.
Various studies have established predictive models for determining the UCS of rocks using soft computing techniques.Wang et al. 12 developed a model using a random forest algorithm that showed consistent results with laboratory tests.Dadhich et al. 13 analyzed the efficacy of an ML model based on various features and concluded that random forest regression was the optimal method.Wang et al. 14 applied two machine learning models using non-destructive and petrographic studies and showed that the extreme gradient boosting model outperformed the random forest model in predicting UCS.Tang et al. 15 established a predictive model using an improved least squares tree algorithm that demonstrated the model's usefulness in engineering applications.Fattahi 16 introduced a new relevance vector regression (RVR) method enhanced by two algorithms to forecast the UCS of weak rocks.They found that the RVR optimized by the harmony search algorithm outperformed the one optimized by the cuckoo search.Lei et al. 17 conducted a comparative study of six prediction models that were hybrid and based on the BP neural network, along with six optimization algorithms using swarm intelligence.They proved that FA-BP was the best model among others in predicting UCS.
Several studies have demonstrated the effectiveness of Artificial Neural Networks (ANN) and Machine Learning (ML) in predicting the UCS.Momeni et al. 18 showed that particle swarm optimization-based ANN predictive model outperformed conventional ANN techniques in predicting UCS through direct and indirect estimation.Abdelhedi et al. 19 demonstrated that the combination of multiple linear regression and ANN effectively indicates the UCS values of carbonate rocks and mortar by correlating porosity, density, and ultrasonic measurements with UCS.Ozdemir 20 utilized artificial intelligence-based age-layered population structure genetic programming (ALPS-GP) and an artificial neural network (ANN) to estimate the unconfined compressive strength (UCS) and found both methods to be influential.Azarafza et al. 21proposed a DNN model and demonstrated its efficacy in obtaining the strength of marlstone.The model was also verified using classifiers like support vector machine, logistic regression, decision tree, loss function, MAE, MSE, RSME, R-square, etc. Wei et al. 22 used the artificial neural network (ANN) approach to estimate the unconfined compressive strength (UCS) of sedimentary rocks at the Thar Coalfields.Their findings indicated that the Brazilian tensile strength had the most significant influence on UCS estimation.Fang et al. 23 predicted equations based on various training algorithms and established the supremacy of the ANFIS model over the other models considered.Gupta and Nagarajan 24 , Hassan and Arman 25 , Liu et al. 26 , Shahani et al. 27 , and Qiu et al. 28 studied the performance of different machine learning models in predicting UCS.They suggested the best model based on factors such as absolute error, root mean square error, coefficient of determination, etc.
Over time, multiple techniques have been developed to predict the rocks' strength accurately.These techniques have proven effective for different types of rocks.However, this study aims to introduce two simple machinelearning methods, the linear regression model and the step-wise regression model, implemented in Python to estimate the strength of Charnockite rocks.The models analyze the parameters BTS, PLI, N, and UPV to predict

Petrographic analysis
Macroscopic and microscopic observations were conducted to determine the type of rock.The rocks appeared dark grey to black in their fresh state, with a fine-to-medium-grained texture and an equigranular granoblastic homogeneous fabric without layering.It was difficult to distinguish the dark grey plagioclase from the mafic portion of the rock, and the presence of hornblende gave it a black hue.Thin sections of basic granulites exhibited granulitic texture, with minerals interfering with each other's growth.The common minerals constituting the basic granulites were labradorite, hypersthene, and augite, with a constant association of secondary hornblende, sometimes prepondering over the pyroxenes.Black opaque was present in negligible amounts, along with biotite and apatite.The development of faint gneissose structure due to the sublinear arrangement of mafic constituents was observed in some of the slides.The dark color of Charnockite is caused by thin greenish or yellowish-brown veins and stringers throughout the rock, particularly in the feldspars but also in quartz and other minerals.Images of rocks under a petrographic microscope are displayed in Fig. 2. The primary and minor minerals observed from thin section analysis are shown in Table 2.

Laboratory testing methods
Various properties and parameters of rocks, including index properties, physical parameters, and destructive and non-destructive parameters, have been found to correlate with the UCS of different types of rocks.Equations have been established with high levels of accuracy by Bagherpour et al. 29 and Daoud et al. 30 .However, Aladejare 31 suggests using these equations only for the same rock types they were developed for.To investigate the UCS of rocks in the Perungudi region in Chennai, 84 specimens were collected from different locations.These specimens were then transported to the laboratory for various tests, including the UCS test, BTS test, PLI test, SHN, and www.nature.com/scientificreports/UPV, following ASTM standards.The specimens were prepared and tested by ASTM D4543-19 32 and ASTM D7012 33 standards.Procedures for determining pulse velocities, rock hardness by rebound hammer method, Brazilian tensile strength, and point load index were suggested by ASTM D2845-08 34 , ASTM D5873-95 35 , ASTM D3967-95a 36 , and ASTM D5731-02 37 , respectively.All tests reported in this paper adhered to the standards (Fig. 3).The results of the various tests performed in the laboratory are shown in Table 3.The table provides statistical information on the properties of the rocks, which was used as a database for the study.Figure 4 depicts a histogram plot showing the variation of the properties.

Regression ML models using Python
In a recent study, Liu et al. 26 examined the UCS prediction capabilities of three different boosting machine models: adaptive boosting, category gradient boosting, and extreme gradient boosting.They compared the models' performance using five regression metrics.Another study by Xu et al. 38 introduced a novel prediction model called the SSA-XGBoost model.This model was more effective in predicting the UCS of rocks than six other models evaluated using RMSE, correlation coefficient, MAE, and variance interpretation.The paper presents two machine learning-based regression models, namely, Linear Regression and Step-wise Regression, that can predict the UCS of Charnockite rocks.The models are constructed using 84 sample data from 4 different parameters (UPV, N, BTS, and PLI) to find the UCS value of the Charnockite rock samples.In both models, 80% of the data are trained with the help of Python supported machine learning techniques like (scipy, numpy, pandas, seaborn, and matplotlib) and with those trained data, the model is fitted in such a way to predict the coefficient and intercept of the data.These findings are necessary to form the linear regression equation, and the r value is obtained from that.The performance of the models was evaluated, and a superior UCS estimation model was reported.

Linear regression ML model
Linear regression is the most widely used machine learning model.The Scikit-Learn module is used in Python to build, train, and test the linear regression model.To estimate the unconfined compressive strength (UCS) of Charnockite rocks, we utilize the following rock properties: UPV, N, BTS, and PLI.These properties are extracted from a dataset of 84 Charnockite rock samples and fed into the Jupyter Notebook for prediction.To understand the dataset better, a pair plot can be generated.The dataset has been split into two distinct sets: one for training  www.nature.com/scientificreports/purposes and another for testing.This study used a split of 80:20, with 80% of the dataset allocated for training and the remaining 20% reserved for testing.A linear regression machine learning model was then created and trained using the segregated data.To simplify the process, sci-kit-learn was utilized.Afterward, the machine learning model was applied to the test data set to generate predictions.Scatter plots were created to compare the predicted values with the actual values.To visually evaluate the performance of the model, the residuals were plotted.Python code has been developed considering all four parameters.All 4 data have been considered for both test and train datasets.After training the model with the train data, the test data of variables (UPV, N, BTS, and PLI) are introduced to the trained model to determine the predicted UCS.Then, the model calculates the correlation coefficient by comparing the predicted UCS and the test value of the observed UCS.To represent the findings visually, Python's graphical options present them in chart and plot formats.

Step-wise regression ML model
Step-wise regression is a method that involves step-by-step inclusion or exclusion of variables to create a regression model that precisely explains the data with the minimum number of essential variables.This approach automatically selects the most significant variables and excludes the insignificant ones, making it superior to many other regression techniques.Initially, all four variables are considered, and at each step, the most insignificant variable is eliminated to provide a better result and to determine the significance of the data.Both models proved to be highly accurate methods for predicting the UCS of the rock.Figure 5 shows the scatter distributions between the variables in the linear regression model.Figure 6 shows the predicted values of the UCS against the observed values during the training and testing of the linear regression model.Figure 7 shows the predicted values of the UCS against the experimental values during training and testing of the step-wise regression model.The experiments were carried out by the ASTM standards on 84 Charnockite samples that were recovered from various parts of the Perungudi region.The samples were tested in the laboratory to assess properties such as UCS, UPV, N, BTS, and PLI.The results from the experiments were used to evaluate the performance of the models.The predictive values that the linear and step-wise regression models prepared were compared with the observed values by regression.The results showed that both models correlated well with the observed values.
Table 4 provides the empirical equations for estimating UCS with individual parameters.The analysis showed a good correlation between all parameters and the UCS, with UPV having the highest correlation and N having the lowest.The estimated R 2 for the linear regression model was 0.98 and 0.986 for training and test datasets, indicating high accuracy of data overlap.Similarly, for the step-wise regression model, the estimated R 2 was 0.988  Table 5 presents the regression metrics of the linear regression model and step-wise regression model.The indices show the variations between the values that were predicted and the ones that were observed.According to the analysis, for the training dataset in the linear regression model, the estimated values for MAE and RMSE were 3.41 and 4.41, respectively.For the test dataset, the values were 2.90 and 3.83, respectively.Similarly, for the stepwise regression model, the estimated MAE and RMSE for the training dataset were 3.63 and 4.66, respectively, and for the test dataset, the values were 2.71 and 3.60, respectively.The MAE and RMSE values were lower in the step-wise regression model than in the linear regression model.The reduction in the MAE and RMSE values indicated the high accuracy of the prediction capacity of the step-wise regression model.

Conclusions
The strength of rock, along with other properties, plays a crucial role in civil projects.Many properties can be conveniently tested in both the laboratory and the field.In this research, we performed tests to measure the rock properties in the laboratory.We determined properties such as UPV, N, BTS, and PLI for Charnockite samples obtained from the Perungudi region in Chennai.We used Python to implement the Linear Regression ML model and Step-wise Regression to predict the Uniaxial Compressive Strength (UCS).Petrographic studies confirmed the rock as Charnockite rocks, displaying high percentages of Quartz, Feldspar, Hypersthene, and Hornblende, with slight traces of mica and Sillimanite.The statistical analysis showed UPV had the most significant effect on the UCS.We evaluated the criteria of the models (R2, MAE, RMSE), which showed high accuracy for estimating the UCS using these methods.Among the models, the Step-wise regression model showed the best results for

Figure 1 .
Figure 1.Map showing the location of Perungudi in Chennai.

Figure 2 .
Figure 2. Images of rock under a petrographic microscope.

Figure 3 .
Figure 3. Laboratory tests on a rock specimen.

Figure 5 .
Figure 5. Pair plot of the database.

Figure 6 .
Figure 6.Regression line for ML linear regression model.

Table 1 .
Equations for estimation of UCS using PLI, BTS, N

Table 2 .
Findings from the analysis of rocks under a thin section.

Table 3 .
Engineering properties of Charnockite rocks.

Table 4 .
Equation for UCS estimation from various parameters.

Table 5 .
Regression metrics table for the Models.www.nature.com/scientificreports/forecasting the UCS.Comparing the results of the two models, the Step-wise regression model, with R2 = 0.99, MAE = 2.71, and RMSE = 3.60, showed the best performance for estimating the UCS of Charnockite rocks.